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Abstract 

Background: The maintenance of chromosomal integrity is an essential task of every living organism and cellular 
repair mechanisms exist to guard against insults to DNA. Given the importance of this process, it is expected that 
DNA repair proteins would be evolutionarily conserved, exhibiting very minimal sequence change over time. 
However, BRCA1, an essential gene involved in DNA repair, has been reported to be evolving rapidly despite the 
fact that many protein-altering mutations within this gene convey a significantly elevated risk for breast and ovarian 
cancers. 

Results: To obtain a deeper understanding of the evolutionary trajectory of BRCA1, we analyzed complete BRCA1 
gene sequences from 23 primate species. We show that specific amino acid sites have experienced repeated 
selection for amino acid replacement over primate evolution. This selection has been focused specifically on 
humans and our closest living relatives, chimpanzees {Pan troglodytes) and bonobos (Pan paniscus). After examining 
BRCA1 polymorphisms in 7 bonobo, 44 chimpanzee, and 44 rhesus macaque [Macaco mulatto) individuals, we find 
considerable variation within each of these species and evidence for recent selection in chimpanzee populations. 
Finally, we also sequenced and analyzed BRCA2 from 24 primate species and find that this gene has also evolved 
under positive selection. 

Conclusions: While mutations leading to truncated forms of BRCA1 are clearly linked to cancer phenotypes in 
humans, there is also an underlying selective pressure in favor of amino acid-altering substitutions in this gene. 
A hypothesis where viruses are the drivers of this natural selection is discussed. 
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Background 

Defects in the BRCA1 or BRCA2 genes are responsible for 
most hereditary forms of breast cancer and account for as 
many as 10% of all breast cancer cases [1]. Women with a 
strong family history of cancer who possess a harmful 
BRCA1 or BRCA2 allele are at high risk for developing 
breast cancer within their lifetime (80% and 60%, respect- 
ively) [2,3]- In addition, BRCA1 mutation carriers have a 
30-40% chance of developing ovarian cancer, while BRCA2 
mutations also increase the risk of ovarian, pancreatic, 
prostate, and male breast cancer [2]. Cancers occur when 
heterozygous individuals experience a somatic loss of het- 
erozygosity event at the BRCA1 or BRCA2 locus, leaving 
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only the abnormal allele intact. Because both gene products 
play a critical role in key cellular processes such as 
DNA repair, cell cycle control, and transcriptional regu- 
lation, it is clear why inactivating mutations are so detri- 
mental. The importance of these proteins is further 
evidenced by the fact that both BRCA1 and BRCA2 null 
mice are embryonic lethal [4]. 

Given their indispensible functions in maintaining the 
integrity of the genome, one might expect strict evolu- 
tionary conservation of BRCA1 and BRCA2 over time. 
Indeed, some regions of BRCA1 have experienced puri- 
fying selection strong enough to operate even on syn- 
onymous mutations [5] . However, contrary to this line of 
reasoning, a number of groups have documented the rapid 
evolution of BRCA1 [6-11] and BRCA2 [10] in mammals. 
Rapid evolution occurs when a gene experiences positive 
natural selection for new, advantageous mutations that 
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arise in a population. Because advantageous mutations 
commonly involve a change in protein sequence (non-syn- 
onymous mutations), recurrent rounds of positive selection 
in a gene lead to rapid evolution of the encoded protein se- 
quence over time. For BRCA1, the evolutionary rate was 
particularly elevated on the branches leading to humans 
and chimpanzees (Pan troglodytes) [6]. The identification of 
this signature in BRCA1 suggests that some alleles and 
polymorphisms currently circulating within the human 
population may offer a selectable advantage. However, both 
the cause and consequence of this unexpected mode of 
evolution seen in BRCA1 remain unknown. 

Here, we report an extensive evolutionary analysis of the 
primate BRCA1 gene. In previous studies of BRCA1 evo- 
lution, only exon 1 1 was examined with a limited number 
of primate species included in the analyses [6-11]. To ex- 
tend previous studies, we have generated full-length 
BRCA1 sequences for 17 additional primate species. Using 
this more extensive dataset, we validate the finding of 
positive selection in humans and their closest ape relatives 
(in our study, chimpanzees and also bonobos (Pan panis- 
cus)). We also show that specific codons in BRCA1 have 
experienced recurrent positive selection over evolutionary 
time, both within and outside of exon 11, resulting in a 
small number of highly variable residue positions in an 
otherwise highly conserved protein. In addition, we se- 
quenced exon 11 oiBRCAl from populations of chimpan- 
zee, bonobo, and rhesus macaque (Macaca mulatto) 
individuals and found that several unique polymorphisms 
exist within these populations. Two polymorphisms in the 
chimpanzee population were found to be in Hardy- 
Weinberg disequilibrium suggesting that selection may 
still be operating on this gene in modern times. Lastly, 
exon 11 of BRCA2, another important genetic determin- 
ant for hereditary breast and ovarian cancers, was also se- 
quenced from diverse primate species. This gene also 
bears the surprising signature of positive selection. It is 
unclear why these critical genes bear this unusual evolu- 
tionary signature, but we present one possible hypothesis 
involving interactions between DNA repair proteins and 
viruses. 

Results 

BRCA1 is evolving under positive selection in primates 

To expand our understanding of the positive selection 
shaping BRCA1 in primates, we obtained cell lines from 
17 simian primate species, harvested total RNA, and cre- 
ated cDNA libraries. From these, the 5.6 kilobase full- 
length coding region of BRCA1 was sequenced. These 
sequences were combined with full-length BRCA1 se- 
quences from six primate species with available genome 
projects, creating an alignment of 23 full-length BRCA1 
sequences. 17 out of the 23 full-length sequences have 
never before been analyzed (asterisks in Figure 1A). 



The type of selection that a gene has experienced can be 
inferred from its rate of accumulation of non-synonymous 
(changing the encoded amino acid; denoted dN) and syn- 
onymous (silent; dS) substitutions over time. Protein- 
altering mutations are far less likely to be tolerated than 
synonymous mutations, and so dN/dS < < 1 for the vast 
majority of genes encoded by human and other mamma- 
lian genomes [12]. Some genes, such as pseudogenes, 
evolve neutrally with dN/dS ~ 1 because there is not 
strong selection for or against new mutations in these 
genes. Finally, selection in favor of non-synonymous mu- 
tations results in a dN/dS > 1. These genes are classified as 
being under positive selection, and are experiencing con- 
tinued selection for "innovation" at the protein sequence 
level. In these genes, not only has the penalty against 
protein-altering mutations been relaxed, but this very type 
of mutation is being selectively retained. Using PAML 
[13], we fit the full-length BRCA1 alignment (Additional 
file 1) to models of positive selection where a subset of co- 
dons is allowed to evolve with dN/dS > 1 (M2a, M8) and 
to null models not allowing positive selection (Mia, M7, 
M8a). Likelihood ratio tests revealed that the dataset fit 
the positive selection models significandy better than the 
null models (p < 0.05, Table 1). Thus, BRCA1 has experi- 
enced selection in favor of non-synonymous mutations 
over the speciation of simian primates. 

We next estimated dN/dS values on each branch on 
the primate evolutionary tree using the free-ratio model 
in PAML. As expected, most branches exhibited a dN/ 
dS < 1 (Figure 1A). The branch leading to humans had 
the most elevated signal with a dN/dS of 2.79. The sec- 
ond highest value of dN/dS on the BRCA1 tree is found 
on the branch leading to the last common ancestor of 
bonobos and chimpanzees, with a dN/dS of 2.66. Be- 
cause the free-ratio model is highly parameterized, we 
next compared one-ratio and two-ratio models to deter- 
mine whether selection has differentially affected the hu- 
man, chimpanzee, and bonobo clade. As shown in 
Figure IB, our simian primate dataset fit the two-ratio 
model significantly better than the one-ratio model, with 
the human, chimpanzee, and bonobo clade exhibiting a 
dN/dS of 1.78, while all other branches had a dN/dS of 
0.59. In summary, our extended primate dataset shows 
that BRCA1 is experiencing positive selection, and that 
the most intense selection has operated on the human/ 
chimpanzee/bonobo clade. 

Based on a comparison of extant and predicted ances- 
tral sequences, humans are estimated to have accumu- 
lated 25 substitutions in the BRCA1 gene since their 
divergence from chimpanzees and bonobos six million 
years ago, 22 of which are non-synonymous (Figure 2A). 
In order to understand how unusual this is, we looked at 
the evolution of other genes, specifically ones encoding 
BRCAl-interacting proteins, along the branch leading to 
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Figure 1 Evolution of BRCA1 over the course of primate speciation. A. dN/dS values for each branch of the primate phylogeny were 
calculated using the free-ratio model in PAML [13]. Branches exhibiting dN/dS values > 1 are shown in bold italics. Dashes (-) represent branches 
where zero synonymous substitutions are predicted to have occurred. On these branches, dS = 0 and dN/dS can therefore not be calculated. 
In these instances, the numbers of non-synonymous (N) and synonymous (S) substitutions predicted to have occurred along each branch are 
indicated in parentheses (N:S). Of these, branches that experienced 4 or more non-synonymous substitutions are in bold italics. Asterisks indicate 
new sequences generated in this study. B. The human, bonobo, and chimpanzee clade was isolated and dN/dS values were calculated using the 
one-ratio and two-ratio models in PAML. The two-ratio model was a better fit as determined by the likelihood ratio test shown in the box. cuO is 
the calculated dN/dS for all branches under the one-ratio model, or for background branches under the two-ratio model, and uj1 is the dN/dS for 
the isolated branches in the two-ratio model. 



humans. Because we do not have extended sequence sets 
for all of these genes, we took a simpler approach. For 
each gene, we aligned the human, chimpanzee, and 
gorilla sequences and manually counted the number of 
human-specific substitutions (any position where the hu- 
man gene sequence differs from both the chimpanzee 
and gorilla gene sequence). These were categorized as 
non-synonymous (N) or synonymous (S) based on how 
they affected the codon in which they were found. When 
these values are normalized to gene size, BRCA1 has the 
highest enrichment of non-synonymous substitutions 
[(N/kb)/(S/kb)]. Care must be taken in comparing this 
metric between genes, because different genes have dif- 
ferent equilibrium codon frequencies, and therefore have 
different mutational opportunities for synonymous and 
non-synonymous mutations. However, the BRCA1 gene 
has an enrichment ratio that is more than 4-fold higher 
than any of the other genes shown (Figure 2B). 

BRCA1 encodes a 220 kDa protein with two conserved 
domains: an N-terminal RING domain and two tandem 
C-terminal BRCT domains (Figure 2C). The RING 



domain has E3 ubiquitin ligase activity that is essential 
in the DNA damage response. The BRCT motifs func- 
tion as a protein-protein interaction module that binds 
phosphorylated proteins involved in DNA repair, cell 
cycle control, chromatin remodeling, and transcription. 
There is also a coiled-coil region between these two do- 
mains. Interestingly, all but one of the non-synonymous 
substitutions predicted to have occurred in the human/ 
bonobo/chimpanzee clade fall outside of these known 
structural motifs (Figure 2C). 

Human variation at selected sites in BRCA1 

The M8 model allows a class of codons to evolve under 
positive selection (dN/dS > 1). 10 codons were identified 
as belonging to this class with a high posterior probabil- 
ity (P = 0.85 or above). These codons do not lie in the re- 
gion of BRCA1 where it was previously reported that 
selection might be acting against synonymous mutations 
[5], potentially given rise to a false signature of dN/dS > 
1. Instead, all 10 sites show high variability between pri- 
mate species at the protein level, often encoding very 
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Table 1 PAML Analysis of BRCA1 and BRCA2 
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Figure 2 BRCA1 evolution in the human, bonobo, and chimpanzee clade. A. dN/dS values for BRCA1 were calculated on each branch of the 
primate tree using the free-ratio model in PAML. dN/dS values > 1 are shown in bold italics. The numbers of non-synonymous (N) and synonymous (S) 
substitutions predicted to have occurred along each branch are indicated in parentheses (N:S). The asterisk represents the last common ancestor of humans, 
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dissimilar amino acids (first four rows in Figure 3A). Next, 
these positively selected codon positions were examined for 
variability within the human population. The Breast Cancer 
Information Core (BIC, http://research.nhgri.nih.gov/bic/) 
is a repository of human BRCA1 polymorphisms. Using this 
database, we identified single nucleotide polymorphisms 
(SNPs) at amino acid sites 170, 888, 890, 1203, and 1443 
(Figure 3A). At four out of these five sites (position 888, 
890, 1203, and 1443), we find that some human BRCA1 al- 
leles encode a unique amino acid not observed in any of 
our primate sequences. In addition, SNPs known to cause 
human disease occur in six out of 10 sites. In all cases, 
these disease-linked SNPs are not amino acid-altering mu- 
tations, but rather more radical frame-shifting or nonsense 
mutations (Figure 3A). In particular, nonsense mutations 
occurring in codon 1443 are among the most common mu- 
tations documented in the BIC. In Figure 3B, all 10 sites of 
positive selection were mapped onto a domain diagram of 
BRCA1 (bottom) along with the most common human 
non-synonymous SNPs found in the BIC (top). As de- 
scribed previously for mutations accumulated in the 
human/chimpanzee/bonobo clade, all but one of the 
positively selected residues (1370S in the coiled-coil 



domain) lie outside of any known structural motifs. In 
summary, the 10 codon positions identified in this analysis 
are highly variable between primate species and within the 
human population, and are involved in the etiology of can- 
cers associated with this gene. Disease-associated SNPs at 
these sites tend to be radical, protein-truncating muta- 
tions. However, a presumably distinct phenomenon ap- 
pears to be driving selection in favor of non-synonymous 
point mutations at these positions. 

BRCA1 variation in other primate populations 

So far, we have documented sequence differences be- 
tween the BRCA1 proteins of different primate species. 
We have shown that non-synonymous substitutions are 
accumulating in BRCA1 faster than expected under con- 
strained, or even neutral, evolution. We next wished to 
explore whether positive selection is still acting on 
BRCA1 in modern populations. There is already evi- 
dence that this is true in the human population, because 
several BRCA1 SNPs have been found to depart from 
Hardy- Weinberg equilibrium in European populations 
[14,15] and in Australia [6]. We wished to determine if 
the same might be true in bonobo and chimpanzee 
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Figure 3 Specific codons in BRCA1 have experienced positive selection during primate speciation. A. Shown are the ten codons that have 
evolved under positive selection (dN/dS > 1 ) in primates with a P > 0.85. Codons with a P > 0.95 are indicated with asterisks. The amino acids 
encoded at these positions in human BRCA1 are shown, along with those found in hominoids, old world monkeys, and new world monkeys. In 
addition, human SNPs and disease mutations also found at these sites are listed. X refers to a single nucleotide mutation that results in a 
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populations. We amplified and sequenced the largest 
BRCA1 exon, exon 11 which is -3.4 kilobases and com- 
prises -61% of the BRCA1 coding region, from the gen- 
omic DNA of seven bonobo and 44 chimpanzee individuals 
(Table 2). In bonobos, we found nine polymorphic sites, 
eight of which were single nucleotide polymorphisms 
(SNPs), with three of these being non-synonymous. Eight 
of the SNPs were in Hardy Weinberg equilibrium. 



Interestingly, one bonobo individual was also homozygous 
for a seven amino acid deletion (A1058-1064) (Table 2). 
Hardy- Weinberg equilibrium was rejected for this poly- 
morphism, although the support was weak and did not 
survive correction for multiple testing (Table 2). The 
chimpanzee sequence set revealed nine SNPs, seven of 
which were non-synonymous. Interestingly, in this larger 
sample set (n = 44), three of the non-synonymous SNPs 



Table 2 SNP Analysis of BRCA1 in Bonobo, Chimpanzee, and Rhesus Macaque Individuals 
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K 


K309T 


K309Q, K309T 




E427K 


34 


9 


1 


0.663 


E 








S578S 


40 


4 


0 


0.752 


S 


S578Y 


S578Y 




G590S hwd 


20 


12 


12 


0.004 


S 


S590G 


S590G 




K731E 


19 


16 


9 


0.122 


K 


delAGAAG* 


delAGAAG* 




I925T 


34 


9 


1 


0.663 




I925L 


I925V, I925L, insT* 




S1042S 


41 


3 


0 


0.823 


S 








G1077R HWD 


42 


1 


1 


1.4E-5 


G 




G1077W, G1077G 




G1100E 


20 


16 


8 


0.155 


G 






Rhesus n = 44 


A22SA 


42 


2 


0 


0.888 


A 








N375S 


43 




0 


0.920 


N 


delA*, N376S 


delA*, N376S 




R466R 


42 


2 


0 


0.888 


K 


K467X* 


K467X* 




T487S 


43 


1 


0 


0.920 


T 


insA* 


insA* 




N684N 


29 


14 


1 


0.647 


N 








V739M 


38 


6 


0 


0.624 


V 


V740L 


V740L, insA* 




D773G 


29 


15 


0 


0.173 


G 








D852D 


40 


4 


0 


0.752 


D 


insA* 


insA* 




N923H 


40 


4 


0 


0.752 


N 








K936K 


40 


4 


0 


0.752 


K 








A1167E 


40 


4 


0 


0.752 


A 








Q1203R 


29 


14 


1 


0.647 


R 


R1203Q, R1203G, R1203X* 


R1203Q, R1203G, R1203X* 


"Numbering refers to the amino acid position in the respective primates. In the case of rhesus macaques, amino acids 375 to 936 correspond to amino acids 376 



to 937 in humans. 

b p-values were calculated using a chi-squared test with a df = 1 . A p-value cutoff (after Bonferroni correction) < 0.0056, 0.0056, and 0.0042 for bonobos, 

chimpanzees, and rhesus macaque, respectively, was considered statistically significant. Tests that survived this correction have the p-value listed in italics. 

c Amino acid found in the human BRCA1 protein at each of the positions listed. 

d Human variants found at the positions indicated in the Breast Cancer Information Core. 

e Human variants found at the positions indicated in the 1000 Genomes database. 

*Known human disease-causing variant. 

hwd 5Np s found to be in Hardy-Weinberg Disequilibrium. 
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were found to be in Hardy Weinberg disequilibrium, sug- 
gesting that selection is acting either for (E309K and 
G590S) or against (G1077R) these mutations. The support 
for one of these (E309K) was weak and did not survive 
correction for multiple testing (Table 2). It is particularly 
intriguing to see that humans also share with chimpanzees 
this same S/G SNP at position 590. In both the bonobo 
and chimpanzee populations, all synonymous SNPs were 
in Hardy- Weinberg equilibrium. 

We also sequenced exon 11 from 44 rhesus macaque in- 
dividuals. Rhesus macaques are not part of the human/ 
chimpanzee/bonobo clade and are instead distantly-related 
members of the Old World monkey clade (Figure 1A). In 
these macaques, we found 12 SNPs in BRCA1, with seven 
being non-synonymous (Table 2). This includes a SNP 
found at position 1203, a site of positive selection in the 
inter-species dataset. This codon is also the site of a known 
disease-linked mutation in humans; however, the cancer- 
linked SNP at this position introduces a stop codon. None- 
theless, all of these are in Hardy- Weinberg equilibrium. 

Caution must be used when interpreting signatures of 
selection acting on polymorphisms in primate popula- 
tions. When sampling primates, it is not possible to get 
completely random and non-related population sets. De- 
viations from Hardy- Weinberg equilibrium may occur 
due to factors other than selection. Reasons for falsely 
rejecting Hardy Weinberg equilibrium include 1) non- 
random mating, 2) small population sizes which magnify 
the effects of genetic drift, 3) introduction of new alleles, 
4) population subdivision or admixture, 5) biases in se- 
quencing errors, and 6) linkage disequilibrium with an- 
other locus under selection. Because the chimpanzee 
population consists of individuals from two different 
subspecies, admixture could plausibly lead to rejection 
of Hardy Weinberg equilibrium. 

We also performed the McDonald-Kreitman and Taji- 
ma's D tests on our datasets (data not shown). The tests 
were not significant and therefore do not support selec- 
tion acting on any of these polymorphisms. False conclu- 
sions in this test can again result from a population with 
hidden structure. In summary, while the analyses using 
the simian primate dataset consisting of 23 species sug- 
gest that recurrent positive selection has been acting on 
BRCA1 over the course of several million years, the 
Hardy- Weinberg equilibrium tests performed here and 
by others indicate that selection is acting on modern day 
humans, and possibly also chimpanzees. 

BRCA2 is also evolving under positive selection in primates 

Because of the rapidly evolving nature of BRCA1, we 
also completed an evolutionary analysis of BRCA2, an- 
other strong determinant for hereditary breast and ovar- 
ian cancer. Although BRCA2 has been shown to be 
under positive selection, only a small number of primate 



species was included in this study [10]. We sequenced 
the ~5 kilobase exon 11 from 18 primate species. Exon 
11 is the largest of 27 exons and encodes about 50% of 
the entire BRCA2 protein. The sequences, along with six 
additional sequences from available genome projects, 
were assembled into a multiple alignment (Additional 
file 2). We fit the alignment to positive selection and 
null models as described above. The positive selection 
models were again a significantly better fit to the se- 
quence set than the null models, with a p value < 0.0003 
(Table 1). In summary, BRCA2 is under positive selec- 
tion in primates as well, although this signature appears 
not to be concentrated on the human/ chimpanzee/bo- 
nobo clade (Additional file 3). 

In contrast to BRCA1, BRCA2 is a 390 kDa nuclear pro- 
tein that is exclusively involved in the homologous recom- 
bination pathway for repairing double-strand breaks. The 
eight BRC motifs and the extreme C terminus mediate in- 
teractions with and recruitment of Rad51, a protein that 
catalyzes strand invasion during homologous recombin- 
ation [16-18]. All eight BRC repeats are encoded within 
exon 11. The M8 model estimates that five codons are 
evolving under positive selection with posterior probabil- 
ity > 0.85 (Figure 4A). Two of these positively selected 
sites were found to have a human polymorphism docu- 
mented in the BIC (Figure 4A). When all five sites of posi- 
tive selection are mapped onto a domain diagram of 
BRCA2 (Figure 4B), they cluster within the first three 
BRC domains (1008, 1225, and 1426) and the intervening 
regions (1159 and 1272). To examine this further, we 
aligned the amino acid sequence of all eight BRC repeats 
of human BRCA2 and highlighted sites 1008, 1225, and 
1426 (Figure 5A). Surprisingly, all three sites of positive 
selection lie adjacent to a hydrophobic motif (FxxA) 
known to mediate interactions with Rad51 (Figure 5A red 
box). Since the co-crystal structure of the BRCA2 BRC4 
in complex with Rad51 is available, we mapped these 
three sites to their analogous positions in BRC4 and found 
that they are in close proximity to the Rad51 binding 
interface (Figure 5B, PDB: 1N0W) [19]. The clustering of 
these residues near this interface might provide a clue to 
the driver of natural selection at these sites. 

Discussion 

Nearly all known cases of recurrent positive selection in 
primate genomes involve genes in one of three categories: 
1) immunity, 2) environmental perception (such as odor- 
ant and taste receptors), or 3) sexual selection and mate- 
choice [21,22]. This is due to the fact that ever-changing 
external stimuli (i.e. pathogens, environmental odors/ 
tastes, etc.) drive the selection of new allelic variants. For 
example, immunity factors that are constantly challenged 
by pathogens exhibit some of the most striking signatures 
of positive selection seen in primate genomes [23-28]. 
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Human BRCA2 

Hominoids 

Old World Monkeys 

New World Monkeys 

Human SNPs 

Disease Mutations 



1 008 


1 1 59* 


1 225* 


1 272* 


1426* 


s 


c 


T 


M 


T 


S/G 


c 


S/T/A 


M/V 


T/K 


S/T 


N/K 


V/l 


M/V 


l/L/V 


R/S 


Y/R 


V/l 


M/V 


l/L 








V 












insA 



B 



H372N 
-S384F 

I505T 
r F599S 
! r-P655R 

£|>Y42C!$> 



-6174delT 



; fi/?C 
| Repeats 



£}-D1420Y^ 
"1 



I2490T 
r E2856A 



1 



4K3326X 



helical D8D f OBfoids NLS 



-T1426 
'-•Ml 272 
--T1225 
C1 159 
S1008 



Most Common 
Human Variants 



Sites of Positive 
Selection 



disease-causing mutations 
unknown clinical significance 
2^ no clinical significance 
A sites of positive selection P>0.85 
A sites of positive selection P>0.95 



Figure 4 Codons in exon 1 1 of BRCA2 that have experienced positive selection in primates. A. 5 codons in exon 1 1 of BRCA2 were found 
to be under positive selection in primates. All sites had a P>0.95 (indicated with asterisks) except for S1008 (P = 0.9). The amino acid encoded by 
human BRCA2 at each of these codons is shown. The amino acids encoded by hominoids, old world monkeys, and new world monkeys are also 
shown. Human SNPs and disease mutations deposited to the BIC are listed at the bottom. B. A domain diagram of BRCA2 is depicted with the 8 
BRC repeats, helical DNA binding domain (helical DBD), OB folds, and nuclear localization signals (NLS). Only exon 1 1 was sequenced in this study 
(section in white). The sites of positive selection are represented as triangles at the bottom of the diagram. The 1 1 most common protein-altering 
variants in the BIC are marked as stars at their respective locations at the top. Black stars correspond to disease-causing mutations, white stars are 
variants with no known clinical significance, and grey stars are positions with unknown significance. 



Here, immunity genes will experience positive selection for 
protein-altering mutations that improve recognition of a 
relevant pathogen. Conversely, the pathogen will counter- 
evolve to escape detection, again placing selective pressure 
on the host population for new mutations that improve the 
immunity protein. This cycle can repeat itself indefinitely, 
resulting in an ever-escalating host-virus arms race. There- 
fore, it is surprising to see that BRCA1 and BRCA2, genes 
that do not classically fit into any of the three categories 
listed above, are evolving in a similar manner to these 
highly adaptive immunity genes. In addition to the two de- 
scribed here, other DNA repair genes have also been shown 
to evolve under positive selection [29,30], but the driver be- 
hind this unusual finding remains to be identified. 

An intense battle exists between host DNA repair ma- 
chinery and viruses, and we propose that this could con- 
tribute to the evolutionary signatures documented here. 
Many viruses are known to interact with the DNA repair 
machinery and cell cycle regulators [31,32]. One funda- 
mental issue is that the free ends of viral genomes are 
exposed, in contrast to the host's DNA, which is capped 



by telomeres. Despite this, many viruses need to access 
the nucleus where the host's DNA repair machinery rec- 
ognizes these un-capped viral genome ends as "damaged" 
cellular DNA, activating the DNA damage response. In 
order for productive infection to proceed, viruses must ac- 
tively thwart these host repair pathways. For example, 
DNA repair proteins interfere with the adenovirus life- 
cycle by concatenating the ends of newly synthesized viral 
DNA, inhibiting efficient packaging into viral progeny 
[33]. In turn, adenovirus has evolved a way around this 
blockade by encoding proteins that mislocalize or degrade 
the specific host factors involved. Depending on the virus 
involved, host DNA repair factors can also be hijacked to 
facilitate viral replication. For instance, herpes simplex 
virus- 1 simultaneously activates DNA repair constituents 
that aid in viral genome replication [34,35] and counter- 
acts those that do not [36,37]. Human immunodeficiency 
virus 1 is also known to activate the DNA damage re- 
sponse and manipulate cell cycle checkpoints through the 
actions of its accessory protein Vpr [38,39]. Additionally, 
several studies have shown that specific DNA repair 
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F 
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BRCA2BRC4 



Thr1225 




Rad51 



Figure 5 The sites of positive selection lying within the BRC repeats of BRCA2 are located adjacent to the Rad51 binding region. A. The 

8 BRC repeats of the human BRCA2 protein were aligned using ClustalX. The red and peach colored boxes are the motifs within the BRC repeats 
thought to facilitate binding with Rad51 [20]. Residues 1008, 1225, and 1426 are colored in green, orange, and yellow, respectively. All three sites 
lie just adjacent to the FxxA motif which interacts with two hydrophobic pockets in the Rad51 oligomer. B. The co-crystal structure of BRC4 (blue) 
in complex with Rad51 (grey) is shown (PDB ID 1N0W [19]).The FxxA motif is depicted in red. Residues 1008, 1225, and 1426 are shown in green, 
orange, and yellow, respectively. 



proteins play critical roles in retroviral genome integration 
[40-43] while others seem to decrease the efficiency of in- 
fection [44-46]. 

One can imagine that these and other viruses that ac- 
cess the nucleus during replication could feasibly inter- 
act with BRCA1 or BRCA2, driving the selection of 
variants that ultimately lead to decreased susceptibility 
to infection. However, it is possible that variant alleles 
selected for this purpose would have detrimental conse- 
quences to protein function in the context of host DNA 
repair. Most of the deleterious BRCA1 and BRCA2 vari- 
ants characterized thus far introduce stop codons or 
frame-shifts that result in premature truncation of the 
protein, the consequences of which manifest as cancer 
at relatively early ages. The effects of non-synonymous 



point mutations, such as those documented here, might 
be expected to be much more subtle. The effects of sub- 
tle mutations are more difficult to assess because the 
resulting genomic instability may only be realized later 
in life and can be confounded by other genetic or envir- 
onmental influences. We therefore propose a hypothesis 
where viruses are driving the intriguingly rapid rate of 
evolution seen in BRCA1 and BRCA2, potentially giving 
rise to antagonistic pleiotropy. This would be analogous 
to the malaria and sickle cell anemia trade-off that is 
well documented [47]. 

Conclusions 

The BRCA1 and BRCA2 proteins play key roles in the 
repair of damage to chromosomal DNA. We have 
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expanded the analysis of the evolution of these genes, 
showing that both have been subject to recurrent posi- 
tive selection during simian primate speciation. Al- 
though the force or forces driving the diversifying 
selection of these genes is unknown, the result is that 
the sequence of these proteins has been altered in 
humans and our closest living relatives. It remains to 
be seen whether this is an instance of antagonistic plei- 
otropy, where positive selection driven by one force 
causes functional consequences in another context, po- 
tentially the formation of cancers [48]. 

Methods 

Non-human primate samples 

Of the 44 chimpanzee samples evaluated in this study, 34 
were obtained from the Chimpanzee Biomedical Research 
Resource (NIH8U42OD011197-13), which is supported 
through a cooperative agreement with the National Insti- 
tutes of Health (NIH). This NIH-supported colony is 
housed at the MD Anderson Cancer Center's Michale E. 
Keeling Center for Comparative Medicine and Research 
(KCCMR) in Bastrop, TX. The origins of the chimpanzees 
comprising the KCCMR colony are highly diverse with only 
a few closely related (siblings/offspring) animals in the col- 
ony (Additional file 4). Blood from 34 chimpanzees was 
collected directly into PAXgene Blood RNA Tubes (PreAn- 
alytix) at the same time other blood samples were obtained 
as part of the prescheduled annual veterinary exam for each 
animal. Another 10 chimpanzee genomic DNA samples 
were purchased from Coriell (Additional file 5). 

All 44 rhesus macaque samples evaluated in this study 
were obtained from animals housed at the KCCMR in 
collaboration with researchers at this institution. The 
colony at the KCCMR is a closed breeding colony com- 
prised of approximately 980 rhesus macaques of Indian- 
origin that originated from a colony of 286 founder ani- 
mals in 1988 (degree of relatedness can be found in 
Additional file 6). Blood from these animals was col- 
lected directly into PAXgene Blood RNA Tubes (PreAn- 
alytix) at the same time other blood samples were 
obtained as part of the prescheduled annual veterinary 
exam for each animal. 

Bonobo genomic DNA samples were obtained from 
the integrated primate biomaterials and information re- 
source (IPBIR) of the Coriell Institute or extracted from 
blood samples obtained from the Columbus zoo and the 
Language Research Center, Georgia State University. All 
seven individuals are unrelated (Additional file 7). 

The remaining non-human primate samples were ac- 
quired as cell lines purchased from the Coriell Institute 
under a U.S. Fish and Wildlife Service permit (sources 
and unique identifiers are listed in Additional file 8). 
This study was approved by the University of Texas at 
Austin Institutional Review Board. 



Primate BRCA1 and BRCA2 sequencing 

Human BRCA1 and BRCA2 coding sequences were ob- 
tained from GenBank (accession number NM 007294 and 
NM 000059, respectively). BRCA1 and BRCA2 sequences 
from chimpanzee, gorilla, orangutan, rhesus macaque, and 
marmoset were obtained using the BLAT alignment tool 
on the UCSC genome database (http://genome.ucsc.edu/). 
For the remaining 18 primate sequences, primary or im- 
mortalized cell lines were grown in standard media sup- 
plemented with 15% fetal bovine serum at 37°C and 5% 
CO2. Cells were collected and RNA was extracted using 
the AUPrep DNA/RNA kit (QIAGEN). cDNA libraries 
were generated using Superscript III First-Strand Synthe- 
sis Kit (Invitrogen) using oligo dT or random hexamer 
primers. PCR products were generated using PCR Super- 
Mix High Fidelity (Invitrogen) and directly sequenced or 
cloned into pCR4 for sequencing. Primers used for PCR 
and sequencing can be found in Additional files 9, 10, 11 
and 12. These sequences have been deposited in GenBank 
(accession numbers KM017616-KM017652). 

Blood from rhesus macaque and chimpanzee individ- 
uals was collected in PAXgene Blood RNA Tubes (Pre- 
AnalytiX). RNA was extracted using the PAXgene Blood 
miRNA Kit (QIAGEN) and genomic DNA was obtained 
using the AUPrep DNA/RNA kit (QIAGEN). BRCA1 
Exon 11 was amplified from extracted genomic DNA 
(chimpanzee, bonobo, and rhesus macaque) using PCR 
SuperMix High Fidelity (Invitrogen) and sequenced. De- 
tails on PCR and sequencing primers can be found in 
Additional file 9 and 10. 

PAML analysis 

A multiple sequence alignment was generated for BRCA1 
and BRCA2 using ClustalX2.1 [49]. The alignments are 
straight-forward with only a few small indels (Additional 
files 1 and 2). Gene sequences at each ancestral node were 
reconstructed using the codeml program in PAML 4.3 [50]. 
dN/dS values along each branch of the phylogenetic tree 
were calculated using the free-ratio model. Substitution 
counts given along specified branches are the estimates 
made in the free ratio model, but were also calculated by 
directly comparing the predicted ancestral and the known 
extant sequences and counting differences manually. Both 
methods yielded the same values. The one-ratio and two- 
ratio models were performed as described previously [51]. 
To detect selection, multiple alignments were fit to the 
NSsites models Mia (null model, codon values of dN/dS 
are fit into two site classes, one with value between 0 and 1, 
and one fixed at dN/dS = 1), M2a (positive selection model, 
similar to Mia but with an extra codon class of dN/dS > 1), 
M7 (null model, codon values of dN/dS fit to a beta distri- 
bution bounded between 0 and 1), M8a (null model, similar 
to M7 except with an extra fixed codon class at dN/dS = 1), 
and M8 (positive selection model, similar to M7 but with 



Lou et al. BMC Evolutionary Biology 2014, 14:155 
http://www.biomedcentral.com/1471-2148/14/155 



Page 11 of 13 



an extra class of dN/dS > 1). Model fitting was performed 
with multiple seed values for dN/dS (co) and assuming ei- 
ther the f61 or f3x4 model of codon frequencies [52]. Likeli- 
hood ratio tests were performed to assess whether 
permitting some codons to evolve under positive selection 
gives a significantly better fit to the data than models where 
positive selection is not allowed [53,54]. These different 
model comparisons represent different trade-offs between 
power and accuracy [55]. In all cases the positive selection 
model was a significantly better fit (p < 0.05), and individual 
codons assigned to the dN/dS > 1 class with high posterior 
probabilities (P > 0.85 by Bayes Emperical Bayes [56]) 
were analyzed. The crystal structure was obtained from 
the RCSB Protein Data Bank (http://www.pdb.org) and 
residues under positive selection were mapped using 
MacPyMol (http://www.pymol.org). 

Hardy-weinberg equilibrium test 

Single nucleotide polymorphisms (SNPs) were annotated 
for each bonobo, chimpanzee, and rhesus macaque indi- 
vidual. Allele frequencies were calculated for each SNP 
and tested for departure from Hardy- Weinberg equilibrium 
(http://www.oege.org) [57]. Chi squared values were calcu- 
lated using 1 degree of freedom. A p-value (after Bonferroni 
correction) < 0.0056, 0.0056, and 0.0042 for bonobos, 
chimpanzees, and rhesus macaque, respectively, was con- 
sidered statistically significant. 

Ethics 

No new human data was generated or analyzed in this 
study. 

Additional files 



Additional file 7: Pan paniscus (bonobo) individuals information. 

description - sex and sources of bonobo samples used in this study. 

Additional file 8: Sources and unique identifiers of cell lines used 
to generate primate cDNA libraries and sequences, description - 
sources and unique identifiers of cell lines used in this study. 

Additional file 9: Primers used for BRCA1 amplification and 
sequencing, description - Primers used to amplify and sequence BRCAI. 

Additional file 10: Sequences of primers used for BRCA1 
sequencing, description - sequences of primers used to amplify and 
sequence BRCA 1. 

Additional file 11: Primers used for BRCA2 amplification and 
sequencing, description - primers used to amplify and sequence BRCA2. 

Additional file 12: Sequences of primers used for BRCA2 
sequencing, description - sequences of primers used to amplify and 
sequence BRCA2. 
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